北京邮电大学学报

  • EI核心期刊

北京邮电大学学报 ›› 2012, Vol. 35 ›› Issue (6): 55-59.doi: 10.13190/jbupt.201206.55.zhangzh

• 论文 • 上一篇    下一篇

一种基于长度语义约束的报文格式挖掘方法

张 钊, 唐 文, 温巧燕   

  1. 1. 北京邮电大学 网络与交换技术国家重点实验室, 北京 100876;2. 西门子(中国)研究院, 北京 100102
  • 收稿日期:2012-04-09 修回日期:2012-07-24 出版日期:2012-12-28 发布日期:2013-01-07
  • 通讯作者: 张钊 E-mail:108283@bupt.edu.cn
  • 作者简介:张钊(1986-),男,博士生,Email:108283@bupt.edu.cn 温巧燕(1959-),女,博士生导师
  • 基金资助:

    国家自然科学基金项目(61202434,61170270,61121061;中央高校基本科研业务费专项资金项目(2011RC0505,2011RCZJ15,2012RC0612,2011YB01);

A Length Semantic Constraints Based Approach for Mining Packet Formats of Unknown Protocols

ZHANG Zhao, TANG Wen, WEN Qiao-yan   

  1. 1. State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China;2. IT Security, Cooperate Technology, Siemens (China) Ltd, Beijing 100102, China
  • Received:2012-04-09 Revised:2012-07-24 Online:2012-12-28 Published:2013-01-07
  • Contact: Zhao ZHANG E-mail:108283@bupt.edu.cn
  • Supported by:

    ;Specialized Research Fund for the Doctoral Program of Higher Education

摘要:

为了获得未知报文的格式,提出了基于长度语义约束的报文格式挖掘方法,该方法建立在多序列比对方法的基础上,通过对报文片段之间及其内部迭代地使用长度字段扫描算法来推断报文中的长度字段及其指称字段(组),进而获得未知协议报文的层次结构. 实验结果显示出新算法的有效性:以SNMP V1报文(GetNextRequest和GetResponse)为例,对长度字段挖掘的漏报率为9.1%,误报率分别为16.7%和23.1%,获得的报文结构与协议规范也基本一致.

关键词: 长度字段, 报文格式, 协议规范挖掘, 协议逆向工程, 多序列比对

Abstract:

In order to get the format of unknown protocols, a length semantic constraints based packet format mining method is proposed based on length semantic constraints. First, multiple sequence alignment method is applied to partition a packet into segments. Then, a length identification algorithm is utilized to scan the segments separately to infer length fields and corresponding referred field(s). At last, the format (hierarchy structure) of the packets is obtained. Experiments demonstrate the effectiveness of this method: the false negative rates of length fields for GetNextRequest and GetResponse of simple network management protocol version 1 are both 9.1%, and the false positive rates are 16.7% and 23.1%. The packet hierarchy is also obtained, approximately consistent with protocol format specification.

Key words: length field, packet format, protocol specification mining, protocol reverse engineering, multiple sequence alignment

中图分类号: